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MANAGEMENT OP SPEECH TKCHNOLOGY MODULES 
IN AN INTBRACTIVE VOICE RBSPONSJB SYSTEM 



FIELD OF INVENTION 



This invention relates to the management, in an interactive voice 
response system, of a plurality of speech technology modules. In particular 
it relates to an apparatus and a method for dynamically determining which 
of a plurality of speech technology modules to use during voice interaction 
between the system and a user. 

BACKGROUND OP INVENTION 

In todays business environment, the telephone is used for many 
purposes -. placing catalogue orders, checking airline schedules,- querying 
prices; reviewing account balances; notifying customers of price or 
schedule changes; and recording and retrieving messages. Often, each 
telephone call involves a service representative talking to a caller, 
asking questions, entering responses into a computer, and reading 
information to the caller from a terminal screen. This process can be 
automated by substituting an interactive voice response system with speech 
recognition for the operator. 

An IVR may have more than one speech recognition engine so that for a 
particular interaction the best engine may be selected and used. Patent 
publication WO 98/10413 discloses a speech processing system and a method 
concerning a speech processing system which comprises a number of speech 
recognition modules and speech output modules which are each provided for a 
given type of speech recognition. Depending on the application, the 
modules, which are each provided for a given type of speech recognition or 
speech input, are selected, activated and parameterized by a module 
Select ion -de vice 

The application is necessary for defining the parameters needed in 
the speech recognition, focusing on the function of the particular 
interaction ignores that certain speech recognition engines are better than 
others for certain languages. Each module in the prior art is configured 
for a specific application. Most recognition engines have different 
lexicons for the range of functions but it is the full lexicon which can 
determine an engines suitability for a language. For instance, one type of 
speech recognition engine is preferred for certain languages whereas IBM 
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ViaVoice is a good general all rounder. Choosing one speech recognition 
module according to application alone is not entirely satisfactory when the 
application may be multilingual and there is a need for improvement. 



SUMMARY OF INVENTION 



In one aspect of the present invention there is provided an 
interactive voice response (IVR) system as described in claim 1. 



The capabilities of recogniser required for a particular point in an 
application goes beyond simply the "task" that the application is trying to 
achieve and that it is also necessary to take into account the environment 
within which the application is currently operating. Relevant factors in 
the environment would include: the language of the dialog, che regional 
15 accent, and the characteristics of the telephone voice channel. 

The environment is property of each interaction and may be modified 
by the applicacion programmer. The environment is also a property of the 
application. Knowledge of the actual speech recognition technology is not 
needed within the application since the speech recognition technology 
depends on the environment property which is part of the application. 
Furthermore the environment is dynamically modifiable property which may be 
changed by the application during execution. 



A further complexity with speech recognition over the telephone is 
the type of phone network that the call originates from. The 
characteristics of the voice channel from a land line telephone is 
different from that from a mobile telephone, and that from a telephone 
using Voice over IP (Internet telephony) is different again. This is due 
to the use of differing voice compression techniques. In order to provide 
accurate and reliable speech recognition it is necessary to have a language 
model for the recogniser that matches the characteristics of the voice 
channel, so a different model is needed for each of land line, mobile and 
IP telephony. In the prior art, it would be necessary to attach one 
recogniser for each of these language models to every call to allow for 
calls from all types of telephone. 
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BRIEF DESCRIPTION OF DRAWINGS 

In order to promote a fuller understanding of this and other aspects 
of the present invention, an embodiment will now be described, by way of 
example only, with reference to the accompanying drawings in which: 

Figure 1 represents an ivr platform with integrated ASR modules of 
the present embodiment; and 

Figure 2 represents the steps of the method of the present 
embodiment . 

DESCRIPTION OF PREFERRED EMBODIMENT 

Referring to Figure 1, there is shown a schematic representation of a 
voice processing system 10 of the present embodiment comprising- voice 
processing platform 12 and voice processing software 14; and telephone 
lanes 18. Voice processing system 10 supports up to 60 El or 4 8 Tl 
telephone lines 18 connected through network interface 28 to connect 
callers to the voice system. The telephone lines can come directly from the 
public telephone network or through a private branch exchange (P8X) 30 If 
call volumes require more than 6.0 El or 48 Tl lines, additional voice 
processing systems can be connected together through a LAN and managed from 
a single node. 



The voice processing platform 12 comprises: a personal computer with 
an industry Standard Architecture (l SA > bus or a Peripheral Component 
interconnect <PCI) bus 13. running Microsoft Windows nt ; one or more 
Dialogic or Aculab network interface cards 28 for connecting the required 
type and number of external telephone lines 18; and one or more Dialogic 
V ° 1Ce P rocessin * «rds 46. a dedicated voice data bus. system Computing 
Bus (scbus) 23, connects the network card 28 and the DSP card 4 6 which 
avoids data flow congestion on the PCI system bus and increases voice 
processing speed, a hardware ASR would receive voice data over the SCbus 
35 23. 

The voice processing software 14 comprises IBM's DirectTalk Beans for 
widows (previously known as Voice Response Beans) which is a powerful, 
flexible, yet cost-effective voice processing software. Although the 
4° embodiment is described for Windows, an equivalent platform is also 

available for the UNIX environment from the IBM Corporation, in which case 
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a maximum of 12 digital trunks per system (360 EX or 288 Tl channels) may 
be supported. used in conjunction with voice processing hardware, the 
voice processing software can connect to a Public Telephone Network 
directly or via a PBX. it is designed to meet the need for a fully 
automated, versatile, computer telephony system. DirectTalk Beans not only 
helps develop voice applications, but also provides a wealth of facilities 
to help run and manage them. DirectTalk Beans can be expanded into a 
networked system with centralized system management, and it also provides 
an open architecture, allowing customisation and expansion of the system at 
both the application and the system level. 



15 



The voice processing software 14 comprises: a telephony server 40; 
speech recognition servers 48A and 48B; a dialogue manager (DM) server 50; 
a natural language understanding (NLU) server 52; a development work area 
(not shown) ; an application manager (not shown) ; "a node manager (not 
shown); a general server interface 38; voice application 16; a module 
parameter database 22; voice segments 24; text segments 26; speech 
technology selector 60; and speech parameter selector 62. A general server 
interface 38 manages all communications between the component programs of 
voice processing software 14. A server is a program that provides services 
to the voice response application 16. The modular structure of the voice 
processing software 14 and the open architecture of the general server 
interface (GSI) allows development of servers that are unique to specific 
applications, for example, a user-defined server can provide a bridge 
25 between Voice Response and another product. 

Telephony server 40 interfaces the network interface 28 and provides 
telephony functionality to the voice response application. 



20 



30 



35 



40 



The speech recognition servers 48A and 48B are large -vocabulary, 
speaker -independent continuous ASRs such as IBM ViaVoice or Dragon System 
ASR. In this embodiment the speech recognition servers are the speech 
technology modules although speech recognition is only one example of a 
speech technology and other functions such as tone recognition or text to 
speech could be used as examples of the speech technology. Although the 
speech recognition servers 48A and 48B are performed in software for this 
embodiment, one or both could be embodied in hardware. 

Natural language understanding (NLU) server 52 interprets the user's-,, 
intention as expressed by key items (words or phrases) in the text output 
from the ASR. The NLU server is based on IBM ViaVoice NLU Toolkit. 
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DM server 50, based on IBM v iaV oic e nlu Toolkit, compares the nlu 
output of the user's intentions against the information retirements of the 
specific application or service and directs the IVR application to play an 
appropriate response or prompt to the user. 

The development work area allows the creation and modification of a 
voice-processing application. The application marker executes the voice 
response application. The node manager allows .onitcing of the status of 
application sessions and telephone lines and allows the issue of commands 
to start and stop application sessions. 

voice response application 16 controls the interaction between the 
voice system 10 and a cailer. Applications comprise Telephony Java Beans 
which incorporates the power and ease-of -use of Ja va programming language 
A voice application controls the playing of recorded voice segments 24 or 

synthesized text segments 26 tvi^ ,,„j 

egments 2S. The voice processing system can run up to 

sixty applications simultaneously ranging from one voice response 

application running on all sixty lines to sixty different voice response 

applications 16 each running on a separate line. 



speech technology selector 60 consists of a mapping configuration of 
environment properties to speech recognition modules 48A and 48B. During 
initialisation, the speech technology selector 60 reads the configuration 
and loads it into a hashtable with the environment (e.g. locale) as the 
25 key. 



30 



Speech parameter selector 62 has configuration information 
consisting of a series of mappings from a combination of locale and task to 
the corresponding recognition parameters. During initialisation, and 
optionally when updated at a later time, speech parameter selector 62 reads 
the configuration and loads it into a hash table using a combination of the 
Locale and Task as the key. 



35 



When the application runs and requires speech recognition the current 
environment is passed to the speech technology selector 60. Speech 
technology selector 60 looks up the locale in the hash table and finds out 
which technology is required. The speech technology selector 60 creates an 
instance of the selected technology module and passes the request for 
speech recognition or text-to-speech to that instance. 
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The Speech parameter selector 62 then combines the Locale and Task to 
create a key for its configuration hash table, looks up the parameters for 
the. required speech recognition. it then starts the recognition engine, 
once the recognition technology is started, the voice response unit starts 
playing the prompt to the caller, possibly using text -to- speech. waits for 
a response from the recpgniser and passes on the response at the end. 



10 



15 



IBM DirectTalk Beans are fully compatible with the JavaBeails™ 
specification, allowing anyone who can use a visual application builder 
such as IBM VisualAge for Java, or the Java programming language, to 
provide a telephone interface to business applications. The Ztfr-eetTalJt 
Seans General Information and Wanning guide provides more background 
information about how these beans can fit into an enterprise systems, and 
what kinds of application can be written with the" beans. 

A JavaBean is a reusable Java component. JavaBeans, commonly referred 
to as "beans-, can be low- level or combined into higher- level beans that 
can be used over and over to create applications. 

20 The Direc "alk Beans are high-level beans for components of a typical 

voice response application, such as Menu, Menultem. Form, Announcement, and 
EntryField. These beans are similar to the beans used for creating 
graphical user interfaces: Menu. Menultem. Label. TextField, and so on. Any 
differences in terminology reflect the essential differences between 
25 auditory and visual interfaces. The essential similarities between the 

tasks of building a graphical user interface and building an auditory user 
interface have been fully exploited, it should be easy for someone who has 
developed a graphical user interface to develop an equivalent interactive 
voice response application to support the same end-user tasks, for example. 
30 to add , change, delete, and review records in a database. The DirectTalk 

Beans comprise: The DirectTalk bean. Action beans. Media data beans. The 
EventFilter bean, and Compatibility beans. 



35 



The DirectTalk bean establishes communication between a Java vol 
application and the base DirectTalk system, and also provides access to 
simple call control functions: 



ce 
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waiting for a call, returning a call when finished, making an outgoing 
call, and handing a call over to another application. Other call control 

fUnCC1 ° nS ' SUCh as Cal1 transfer and conferencing are handled by the call 
control beans . 



The action beans can be thought of as "voice interface widgets" 
wh 1C h use an action method to present or -play" output to the caller and 

111 S ° me CaSSS ' aCCept in ^ fc can be key input or voice, . allows 

a caller to press keys or eay command words , co selecfc an option ^ ^ 

bean itself handles all the usual error conditions timeout, invalid key 
entered, and so on. Menui tenl includes both the voice description for the 
item, and the telephone key or speech recognition word that is associated 
with selecting it. HntryField allows the caller to enter data, by pressing 
keys or speaking. The EntryField includes both the voice description and 
the length of time allowed, and, if necessary, the key to be recognized as 
an ENTER key. The Entry F ield bean can easily be extended to allow checking 
for various formats: currency, date, credit card number, 
and so on . 



Media data beans specify the sounds that the caller hears. To help 
you to create international applications, these beans have a locale 
property, which specifies country and language if applicable. A st y l« 
property allows different presentation styles when appropriate. The media 
data beans cannot « P lay» themselves to the caller: they are played by the 
Action beans: E.g.VoiceSegmeut specifies recorded voice data. The 
voicesegment bean is used both for output and for input recorded by the 
VoiceRecorder bean. AudioCurrency specifies a currency amount to be spoken 



The EventFilcer boa=. Normally you would take different actions 
depending on whether a bean is successful or fails. The fiventPilter bean 
makes it possible to fine-tune the next action, according to 
the exact completion code from a bean. 



The Compatibility beans are provided only to maintain compatibility 
Wlth Di - ectT -^ '« ""IK ^ate tables and DirectTalk/2 or DirectTalk for 
Widows user actions. The recommended approach to new voice application 
development is to use the Java beans for all applications, if the beans 
don't do what you want, contact us, using the address at the back of this 
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book or on Che web page, and we- 11 coneider any suggestions. But if you 
already have voice applications that you want to integrate with your new 
Java applications, the compatibility beans provide you with the means. E.g. 
The statelabu compatibility bean allows you co invoke a DirectTalk for 
AIX state table. The Useraction compatibility bean allows you to invoke a 
DirectTalk/2 or DirectTalk for windows user action. The UserActionVariable 
compatibility bean allows a DirectTalk/2 or DirectTalk for Windows user 
action to access system variables. 



10 



15 



20 



All of the base DirectTalk products use the concept of -language", 
and some languages are defined in terms of language and the country it is 
spoken in: for example. Canadian French as opposed to French spoken in 
France. In Java, this is explicitly acknowledged, by using the term locale 
to refer to language-and-country. Each locale is Identified by an 
ISO-defined code, which comprises a language component and a country 
component: for example. fr_CA for Canadian French and fr_FR for French in 
France. Optionally, personalized locale identifiers can be created , 
including an optional user-defined variant. For example en_US_fred. The 
maximum length of a locale identifier is is characters, so the variant part 
can be up to 12 characters. 



Locale is used for identifying precisely what is to be spoken to the 
caller: the words, accent, and phrasing used (by using the variant 
component of the locale identifier, you can also specify any other 
25 characteristics you want) . In other words, locale is used to identify the 

voice segments to be ueed. it is also used for determining the technology to be 
used for text-to-speech and 3pee ch recognition. Furthermore it may optionally alter 
the logic of your application. 

30 A Local e h <* 8 tie format •ll_cc_wvw< where: 11 is the language 

code, such as -en- for English, "fr» for French; cc is the country code, 
such as "US" for the United States of America. "GS" for the United Kingdom; 
and wvw is an arbitrary string of arbitrary length, such as -mobile'- , 
"land line". "Yorkshire" . The language and country codes are usually 

35 taken from those defined in the ISO standard. 



Each IVR system node has a default locale and optionally each 
application can have a default locale, overriding the system default 
locale. An application developer can specify the default locale in the 
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Application Properties. Furthermore it is possible to override the system 
default locale when the application is executing. when the application 
starts, the current locale is either the system default locale or the 
application default locale, if one has been specified. The current locale 
affects only the running of che voice application: it does not affect 
anything else in the Java Runtime Environment. 

Each bean can have its locale property eet but to make an application 
completely language - independent . one should not specify the locale on any 
Media data beans. To make the application speak a specific language, simply 
set the default locale of the system: all applications in that node will 
speak that language. Instead of using the system default locale, one could 
specify a default locale for each application, and have applications 
speakang different languages running in the same system. Thus, the current 
locale provides full i nCeraaeionaiization £w yQur applications , provi<j€d 

that each DirectTalk Beans language is installed on the voice response 

node . 



Normally, the current locale only affects the voice segments chosen 
to be played to the caller. But one can access the locale currently being 
used by the application, to determine the logic of the application. 

Current locals is use ful but one might want to specify the locale on 
the individual Media data beans in a .uKrilin^ual application, where more 
than one language is needed. For example, you might have an application 
that asks the caller which language they want to use, with a voice segment 
for each language option. In such an application, you might want to set the 
current locale dynamically, after asking the caller which language they 
want. This would make the remainder of the application international. 

It's up to the application developer how to use the user-defined 
variant component of the locale identifier. One might record voice segments 
for an application using different voices: male and female, for example. 

A developer might identify these voice segments as. for example. 
en_us_male, en_us female) . 
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A developer might have a different greeting at the beginning of an 
application, depending on which branch of your company the call ie for 
even though the logic of the application is the same for all branches.' To 
achieve this, a locale identifier is created for each branch (for example. 
en_GB_romsey, en^GB.wincheater, en_GB__salisbury) . Then name each greeting 
voice segment using the . appropriate locale identifier, and use different 
default locales to run instances of the application for each branch. The 
other voice segments in the application, common to all branches, should be 
in the base locale (en GB) 



Whenever a voice segment for a specific variant cannot be found, a 
voice segment for the country and language is searched for. 

in some cases, a developer may want to use a language-only locale 
15 identifier, such as -en" or «f r» . One example of this is the use of »en« as 

a locale identifier for the voice segments supplied for the tutorial 
application. These voice segments can be used when the current language of 
an application is any derivative of the en locale: en_tJ S , en_GB, en_AU, and 
so on. Whenever a voice segment for a specific country cannot be found, a 
20 voxce segment for the generic language is searched for. 

Because some speech recognition and text-to-speech technologies are 
better at some languages than others, one can use different technologies 
for processing different languages. You specify one technology for -all 
locals" and then override this for the exceptional languages. This can be 
specified on a per node or per application basis. Specifying the technology 
for a new language or switching between technologies for any language is 
done without altering the application itself. 



30 



The current application locale is assumed for all speech recognition 
attempts, and is used for text -to- speech unless specified on the 
TextToSpeech bean. 



35 



One can set the locale to be used while the application is running, 
either on the basis of the calling number or some other criteria (for 
example, a menu that asks the caller which language they want) . 
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The Dir ec tTalk bean has a currentLocale properfcy ^ Correep 
get and set methods. When ma keCall or waitForCall is invoked. the 9 
CUrrentL ° Cale P-^rty is set to the Default locale (page 43). 

to change the locale during the application, invoke the 

;:;~v ethod - ^ — — call co 

the loLL t ; Th±S met *° d SCtS ^ -o P ert y and passes 

by the application. If you are using a visual builder , you _ COnnect ^ 

bean SeC T "T^" ^ " ^ ™Cale property of the OirectTalk 
Dean. Setting the current locals Sr. t-*< ~ . 

th,, 1n „, - Way d ° eS n0t ch ^3« the value of 

the locale in the application. 



can aPPliCati ° n f ±nisheS » ith ^ starts to handle a new 

call, the default lo ca le is use d again, rather than a locale that the 

Ithol f! h defaUlt * «"< set C urrent,ocale 

method of the DirectTalk bean that initiated the call. paseing it a mll 
parameter. To determine what locale the application is currently usin g 

nvo.ce the getcurrent^ocaie method of the oirectTai* bean that initial 
the can: this returns the currently. Vou can change the behavior of 
the application, if nectary, depending on this value. 



25 



Not all application are completely language - independent . You might 
want to mix the languages SBft k P n >,„= •-■ 9 

nguages spoken by a single application, tn this case 

rather than changing the current locale, use the locale property of ' 

individual media beans to override the current locale. 

the d "°™*" y ' l0Cale ° f «• beans is not set. so that 

an inT l T ^ " " **** ** 

an international application that automatically adapt, itse lf to the local 

requirements (for ejcarmole in rran^o 

in Bri>2< .„ _ P ' Fr *"<~' hear French voice segments and 



35 



in Britain, users hear U.K. English voice segments). However, you can 
override the default locale by specifying a locale for the bean, for 
example, to develop a multilingual message. 

Speech recognition in DirectTalk Beans Environment can be used 
without hard-coding the speech recognition technology inside the 
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application. This means that you can switch from one technology to another 
without changing the application. In addition, different technologies can 
be used for different languages, again, without hard-coding anything in the 
application. (DirectTalk Beans applications are fully 

language- independent.) to achieve this independence from technology and 
language, the plug-in class to be used for any individual recognition 
attempt is found using RecoService configuration entries and 
RecoDef initions . 



10 



IS 



A RecoService configuration entry exists for each plug-in on each 
operating system platform you use in your DirectTalk Beans Environment 
plex. Each voice response system has a NodeName configuration entry, in 
which the relevant RecoService entries are identified. A RecoDef inition 
contains two items of information: RecoType and locale identifier. RecoType 
is a plat form -independent name for the speech recognition technology. The 
locale identifier identifies the languages for which this technology is co 
be used. The locale identifier can be an asterisk <*) . indicating -all 



locales" . 



20 



25 



30 



35 



A RecoDef initions can be stored in the NodeName configuration entry. 
These definitions apply to all applications run in the node, unless the 
RecoDef initions are stored in an application's ApplicationProperties (while 
the application is being run from a visual builder or using the j re or java 
command) or in the AppName configuration entry (when the application is run 
xn a node) . when an EntryField or Menu bean attempts to use speech 
recognition, the current locale of the application is used to find the 
appropriate RecoDef inition. The ApplicationProperties or the AppName 
configuration entry is searched first, and if a RecoDef inition is found for 
the locale, or for "all locales", it is used. If no RecoDef inition is found 
for the locale, the NodeName configuration entry is searched. 

The configuration entries needed for text -to- speech work in exactly 
the same way as the Configuration entries needed for speech recognition. 
Text -to -speech in DirectTalk Beans Environment can be used without 
hard-coding the text -to- speech technology inside the application. This 
means that you can switch from one technology to another without changing 
the application. In addition, different technologies can be used for 
different languages, again, without hard-coding anything in the 
application. (DirectTalk Beans applications are fully 
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l^uage-independent.) To achieve this independence from technology and 
language, the plug -in class t-« v.- . _ 

° be USed f ° r conv erting text to speech for 
any individual TextToS»eech h P =n 4 «, <r« ^ 

„ . xtxospeech bean is found using TTSService configuration 

entries and TTSDef initions . 

The s te PS of the method of the invention are described with reference 

III IT 'd " d±alS frOW a P BX 

f T aPPliCat± - iS - — e the call. a weW message 

i; c ; £i ; d b v he P i aye d < steP ,. 2) to ^ ^ over t L 

open telephone line using defined prompts or text and a text to speech 
module. An IV* application instruction to initiate a recognition 

itziit- <suc v; a fieid entry bean> ie iocated — — 

technology is needed to perform the recognition interaction. Program 

froHh ^ T ed - t0 teClm ° l09y Sele — the environs 

I" * PPliC - tlOB ction or from the application itself and then 

looks up (step 1.5) an appropriate speech technology fro, the hash table 
The technology selector creates (step x.6> an instance of the selected 
speech technology. Next technology selector chooses (step x 7) an 
appropriate set of parameter, for the speech technology by looking in the 

a P nd r ToTt? aSh table baSed UP ° n ^ enV±r ™ ™- <* ^ ileract on 
and now the actual task property of the interaction. Once the parameters 

have been chosen the speech technology starts the interaction (step 1 8 
voice pro.pt is played (step x. 9) eg „ Please say yQUr ^ 
response is received and passed (step i. xo , from recogniser back to 

TlTT n ' ±S C ° reC ° 9niSe <SteP ^ the — number 

spoken by caller and enter it into a field for processing. The next 

instruction in the rvR application is acquired (step x.I2> and the 

interaction with caller is continued through the rest of the application. 

techn r ParameCerS 3 SpSeCh "Cognition operation will depend on the 

technology m use. Two examples are given here: Philips Speech Processing 
(formerly known as Voice Control Systems or VCS) and IBM viaVoice. 

For- Philips Speech Processing (PSPl it- ^ 

ift>F; it is necessary to select a 

vocabulary file an* the sub vocabulary within that file. The vocabulary 
fUe 13 earned according to the following: 
llcctnsv 



8) . A 
A 
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where : 

11 is a 2 -letter language code 
cc is a 2 -letter primary country code 
t is a single letter specifying the technology: 
Discrete 

Discrete/Cut -thru 
Alphanumeric 
Continuous 





D 




U 




A 




W 


n 


is 


s 


is 


V* 


is 



Ls per DSP 



An example is ENUSD82A, which gives 8 channels of discrete 
recognition for US English. There are two sub vocabularies within this 
vocabulary file 



The sub vocabularies contain different set of words, such as numbers, 
simple control words and letters. For example the ENUSD82A vocabulary has 
two sub vocabularies as follows: 



VOCAB 1: 

0 yes 

1 no 

2 help 
25 3 cancel 

4 Stop 

VOCAB 2 : 

0 one 

3 0 1 two 

2 three 
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3 four 

4 five 

5 six 

6 seven 

7 eight 
6 nine 
9 zero 

10 oh 

11 scop 



So for PSP the parameters that are looked up are the vocabulary and 
sub vocabulary, and both are dependent on both the locale and task- for 
the vocabulary, the locale is used to determine the language and country 
and the task to determine discrete, continuous, alphameric; and the sub 
vocabulary will depend on the task and also on the vocabulary since 
vocabularies for different languages, countriea etc. mat have differing sub 
vocabularies . 



For the implementation of ViaVoice on l BMs DirectTalK products, the 

parameters recruired to ci-a^ *. ^ 

V a co start recognition are an "engine" name and one or 

more grammar names. 



The engine name selects a specific language model for Che recognition 
- bo, for example there will be a different engine name for each language 
and for language models within a language that are set up for calls from ' 
land line or mobile telephone network. The engine name, are user 
definable and may vary from one installation to the nexc for the same type 
of engine - thus it i 6 important chat this information is contained in the 
configuration of the installation, rather than within the application The 
engine name is clearly dependent on the locale. 



The grammar names select which recognition grammars will be used for 
the recognition, and so determine the possible results. Multiple grammars 
may be used for example to recognise numbers and help words at the same 
The grammar names are clearly dependent on the task required. 



time 
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in the present embodiment the environment is represented by a locale 
string that represents the language, the country and a -variant". The 
variant can be used to s eiect for example land line, mobile or IP telephony 
language models, or regional variations in the language, or a combination 
of the two. The application takes information that it knows about the 
environment of the can and creates this locale string. 



In an alternative embodiment, some or all of the environment 
information is gathered by the system and the locale S tring is created 
automatical. Factors such as the gender, language and local dialects can 
be determined by analysis of the callers voice. The type of telephone from 
which the call comes, land line, mobile or Internet telephony, may be 
determined from the calling number which is usually available to a voice 
response system. 



Now that the invention has been described by way of a preferred 
embodiment, various modifications and improvements will occur to those 
person skilled in the art. Therefore it should be understood that the 
preferred embodiment has been provided as an example and not as a 

limitation. 
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CLAIMS 



10 



An interact ive 



voice response <IVR) system having : 



a plurality of speech technology modules, each module for receiving 
speech inpuc f rom a U£ _ cr prQducing ^ ech Qutput f ^ ^ ^ »* 

caller^Ttn aPPliC : tiCn ******* * Vl»r1±*Y of interactions between a 
caller and the speech technology modules; 

each interaction having a task property ^ „ interaction 
environment property ; 



15 



property 



20 



An ivr as in claim i further comprising 
a plurality of parameter sets 



module ; 



associated with each speech technology 



25 



tH ^ Selected sP^ch technology module according to the 

eaVir ° nment Pr ° Perty ° f the interact** and the t Mk property of the 
environment. e 



3. An IVR as in claim 1 W h P rp^ 

lan^e identifier. * MV1 ™"» t comprises a 

«_ An XVH as in claim X wherein the environment property compri.es . 

regional identifier. ^ 



35 5 . 



40 



An IVR as in claim 1 „hpr*^ *-t~~ 
call tWl •„ herein the environment property comprises a 

call type identifier. 

An XVR as in claim i wherein the application defines an application 
environment property and whpr^'. „ ~ P XOn 

v F«rty ana wherein an environment properties of an 

interaction takes priority. 
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ABSTRACT 



MANAGEMENT OP SPEECH TECHNOLOGY MODULES 
IN AN INTERACTIVE VOICE RESPONSE SYSTEM 

5 

Tills publication relates to the management, in an interactive voice 
response system, of a plurality of speech technology modules. In particular 
it relates to an apparatus and a method for dynamically determining which 
of a plurality of speech technology modules to use during voice interaction 
10 between the system and a user. In prior art IVR systems each speech 

technology module is configured for a specific application or task. Most 
speech technology modules have different lexicons for the range of 
functions but it is the full lexicon which can determine an engines 
suitability for a language. For instance, one type of speech recognition 
engine is preferred for certain languages whereas IBM viaVoice is a good 
general all rounder. Choosing one speech recognition module according to 
application or function alone is not entirely satisfactory and there is a 
need for improvement. The present solution is to select, for each 
interaction, one of the speech technology modules from the plurality of the 
modules to be used by the application according to the environment property 
of the interaction . 
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